Plotting TFR Estimates
Introduction
In this short workbook, we outline how using R we can visualise and compare the various computations of the Palau TFR overtime. The data source is the World Population Prospects. What we aim to do is compare the different estimates with the aggregate WPP estimates.
Loading the Packages
library(tidyverse)
library(ggthemes)
library(plotly)
library(gt)Loading the data
Next we load our two .csv data into R as two tibble objects. One, “e_tfr” is a compilation of all the different estimates for TFR by data source and method. The tibble has four columns: “data_souce, estimate_method, estimated_tfr, estimated_year”. The other, “wpp_tfr” is the WPP approximations for TFR 1960-2020 @unitednations2022
# load the estimated tfr
e_tfr = read_csv("data/tfr_estimates.csv") %>%
as_tibble()
# load the wpp tfr
wpp_tfr = e_tfr %>% filter(data_source == "WPP")Source Table
# make a table of unique combos of Source + method
unique_combinations = unique(e_tfr[c("estimate_method", "data_source")])
unique_combinations = unique_combinations[order(unique_combinations$data_source, unique_combinations$estimate_method), ] %>%
rename(`Estimate Method` = estimate_method, `Data Source` = data_source)
# export as a gt table and set column background colors
unique_combinations %>%
gt() %>%
tab_style(style = cell_fill(color = "#1F78B4"),
locations = cells_column_labels(columns = 1:2))| Estimate Method | Data Source |
|---|---|
| Reverse survival method | 1958 Census |
| Own-children method | 1967 Census |
| Reverse survival method | 1967 Census |
| Reverse survival method | 1970 Census |
| Own-children method | 1973 Census |
| Reverse survival method | 1973 Census |
| Own-children method | 1980 Census |
| Recent births | 1980 Census |
| Reverse survival method | 1980 Census |
| Reverse survival method | 1986 Census |
| Own-children method | 1990 Census |
| Reverse survival method | 1990 Census |
| Cohort-completed fertility backdated by the mean age of childbearing | 1995 Census |
| Own-children method | 1995 Census |
| Reverse survival method | 1995 Census |
| Cohort-completed fertility backdated by the mean age of childbearing | 2000 Census |
| Own-children method | 2000 Census |
| Reverse survival method | 2000 Census |
| Cohort-completed fertility backdated by the mean age of childbearing | 2005 Census |
| Own-children method | 2005 Census |
| Reverse survival method | 2005 Census |
| Own-children method | 2012 Mini-Census |
| Reverse survival method | 2012 Mini-Census |
| Reverse survival method | 2015 Census |
| Recent births | 2020 Census |
| Computed rate from DYB/WPP | Register |
| Computed rate from NSO/WPP | Register |
| Direct | Register |
| Estimate | WPP |
Static Plots
Now, we use ggplot2() to compute the first static data visualisation. This shows all the different estimated TFR from the ‘e_tfr’ tibble.
# create the scatter plot
est = ggplot(e_tfr) +
geom_line(aes(x = estimated_year,
y = estimated_tfr,
color = interaction(estimate_method, data_source))) +
labs(x = "Estimated Year",
y = "Estimated TFR",
title = "Plot of TFR Estimates of Palau 1950-2020",
color = "") +
theme_solarized_2() +
theme(legend.position = "bottom",
legend.box = "vertical",
legend.margin = margin()) +
guides(color = guide_legend(nrow = 18, byrow = TRUE))
estThe next visualisation shows the estimates with just the WPP approximations
# create the scatter plot
wpp = ggplot(wpp_tfr, aes(x = estimated_year,
y = estimated_tfr,
color = "black"))+
geom_line() +
labs(x = "Estimated Year",
y = "Estimated TFR",
title = "WPP TFR Estimates for Palau 1950-2020",
color = "") +
theme_solarized_2() +
theme(legend.position = "none")
wppMake interactive
Using the ggplotly function from the plotly package, we can make the plot interactive so it is possible to select which data you want to see and zoom + export a custom png. This is very challenging however and could also probably be achieved in Shiny with more ease (though I would have to host a backend which can be £££)
# set plotly boundaries
est_plotly = ggplotly(est, width = 800, height = 2000)
# Create a list of buttons
buttons = list()
# access the levels of the color scale for the correct order
unique_combinations = apply(unique_combinations, 1,
function(row) paste(row, collapse = "-"))
# create a button for each level in the correct order
for (i in seq_along(unique_combinations)) {
visibility_vals = rep(FALSE, length(unique_combinations))
visibility_vals[i] = TRUE
buttons[[length(buttons) + 1]] = list(
method = "restyle",
args = list("visible", visibility_vals),
label = unique_combinations[i]
)
}
# add a button to show all interactions
buttons[[length(buttons) + 1]] = list(
method = "restyle",
args = list("visible", rep(TRUE, length(unique_combinations))),
label = "Show All"
)
# adjust layout
est_plotly = est_plotly %>%
layout(
showlegend = FALSE,
updatemenus = list(
list(
type = "buttons",
direction = "down",
y = -0.1,
x = 0,
xanchor = 'left',
yanchor = 'top',
buttons = buttons,
tickfont = list(size = 5), # adjust button font size
pad = list(r = 10, t = 10), # adjust padding
showactive = TRUE
)
),
margin = list(
b = 800 # adjust bottom margin to make space for buttons
)
)
# display the plot
est_plotlyDifference WPP and Estimates
Now I want to compute and visualise the difference between the WPP line and the different estimates
# drop WPP from data
calc_tfr = e_tfr %>% filter(data_source != "WPP")
# change cols in wpp
wpp_tfr = wpp_tfr %>% select(estimated_year, estimated_tfr) %>%
rename(estimated_year = estimated_year, wpp_tfr = estimated_tfr)
# calc difference
calc_tfr = calc_tfr %>%
left_join(wpp_tfr, by = "estimated_year") %>%
mutate(difference = estimated_tfr - wpp_tfr)Line of Best Fit Scatter
We can visualise this data per year of estimate as a simple point and line of best fit graph
# calculate lobf
difference_model = lm(difference ~ estimated_year, data = calc_tfr)
# extract r squared
r_squared = summary(difference_model)$r.squared
# plot graph
ggplot(calc_tfr, aes(x = estimated_year,
y = difference,
color = estimate_method,
shape = factor(data_source),
group = 1)) +
geom_point() +
geom_smooth(method = "lm",
se = FALSE,
color = "red",
alpha = 0.3) +
scale_shape_manual(values = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) +
labs(title = "Scatter Plot with Line of Best Fit",
color = "Estimate Method",
shape = "Data Source") +
xlab("Year of Estimate") +
ylab("Difference better Estimated TFR and WPP estimate") +
geom_text(aes(x = max(estimated_year), y = min(difference),
label = paste("R\u00B2 =", round(r_squared, 2))),
hjust = 1, vjust = -1, color = "dark red") +
geom_text(aes(x = max(estimated_year), y = min(difference) + 0.1,
label = paste("Regression Formula: ",
round(coefficients(difference_model)[1], 2),
round(coefficients(difference_model)[2], 2),
"x")),
hjust = 1, vjust = -1.5, color = "dark red") +
theme_solarized_2()## `geom_smooth()` using formula = 'y ~ x'